Necessary to avoid R variable names like
variable.name since . in python is somewhat
analogous to $ in R.
Follow the instructions in the stitches-in-R-setup R
markdown so that this notebook will work. If you’ve followed that
markdown, stitches should be installed in the
r-reticulate virtual environment for this notebook and
should be callable.
there’s some redundancies on my python imports, I should put in one place and clean up at some point.
gridded stitching does slow things down. This notebook takes maybe 15 seconds to knit when there’s no gridded calculations. Adding them increases the knit time to maybe like 5 minutes.
can add eval=FALSE for those blocks so people can
get a first knit notebook quickly (all towards end).
reticulatelibrary(reticulate)
## Warning: package 'reticulate' was built under R version 3.5.2
# # indicate that we want to use a specific virtualenv
use_virtualenv("r-reticulate", required =TRUE)
reticulate testquick block of code that will only successfully knit if the setup was done correctly
import numpy as np
print(np.max([2,3]))
## 3
This is a good function to test a stitches installation
with - no arguments needed, just a list of data on pangeo, implicitly
tests that can connect to pangeo.
import pandas as pd
pd.set_option('display.max_columns', None)
import stitches
pangeo_table = stitches.fx_pangeo.fetch_pangeo_table()
print(pangeo_table.head())
## activity_id institution_id source_id experiment_id member_id \
## 0 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1
## 1 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1
## 2 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1
## 3 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1
## 4 HighResMIP CMCC CMCC-CM2-HR4 highresSST-present r1i1p1f1
##
## table_id variable_id grid_label \
## 0 Amon ps gn
## 1 Amon rsds gn
## 2 Amon rlus gn
## 3 Amon rlds gn
## 4 Amon psl gn
##
## zstore dcpp_init_year version
## 0 gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
## 1 gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
## 2 gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
## 3 gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
## 4 gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... NaN 20170706
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
pulled a quick rcp4.5 from HectorUI
hector_tgav <- read.csv('data/Hector-data-2022-09-22.csv', skip=3, stringsAsFactors = FALSE)
hector_exp <- unique(hector_tgav$scenario)
head(hector_tgav)
## scenario year variable value units
## 1 RCP Standard-RCP-4.5 1800 Tgav 0.1176974 degC
## 2 RCP Standard-RCP-4.5 1801 Tgav 0.1212491 degC
## 3 RCP Standard-RCP-4.5 1802 Tgav 0.1243894 degC
## 4 RCP Standard-RCP-4.5 1803 Tgav 0.1272345 degC
## 5 RCP Standard-RCP-4.5 1804 Tgav 0.1298589 degC
## 6 RCP Standard-RCP-4.5 1805 Tgav 0.1323124 degC
This is also a quick back and forth of moving data between R and python blocks following:
import numpy as np
import pandas as pd
tgav_df = pd.DataFrame(r.hector_tgav)
print(np.max(tgav_df[['value']]))
## value 2.552422
## dtype: float64
##
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/numpy/core/fromnumeric.py:84: FutureWarning: In a future version, DataFrame.max(axis=None) will return a scalar max over the entire DataFrame. To retain the old behavior, use 'frame.max(axis=0)' or just 'frame.max()'
## return reduction(axis=axis, out=out, **passkwargs)
and we can compare to the R evaluation
# The hector data we read straight in to R
print(max(hector_tgav$value))
## [1] 2.552422
# R operations on the python data
print(max(py$tgav_df$value))
## [1] 2.552422
First we reshape the Hector time series to match the CMIP6 ESM GSAT smooth anomaly time series that STITCHES operates on: NOTE we should probably add this as a function to stitches NOTE Hector time series start in 1800 - there’s no science reason we couldn’t do matching over that window and have a gridded netcdf of an extra 50 years of history.
hector_tgav %>%
mutate(variable = 'tas',
ensemble = 'hectorUI1',# doesn't affect the matching or stitching
model = 'Hector') %>%
rename(experiment = scenario) %>%
filter(year >= 1850) %>%
select(variable, experiment, ensemble, model, year, value, -units) ->
target_tgav
print(head(target_tgav))
## variable experiment ensemble model year value
## 1 tas RCP Standard-RCP-4.5 hectorUI1 Hector 1850 0.08854785
## 2 tas RCP Standard-RCP-4.5 hectorUI1 Hector 1851 0.09338570
## 3 tas RCP Standard-RCP-4.5 hectorUI1 Hector 1852 0.09959573
## 4 tas RCP Standard-RCP-4.5 hectorUI1 Hector 1853 0.10672673
## 5 tas RCP Standard-RCP-4.5 hectorUI1 Hector 1854 0.11386913
## 6 tas RCP Standard-RCP-4.5 hectorUI1 Hector 1855 0.10890346
Now that we have shaped it as stitches is expecting, we can use stitches functions to calculate the matching windows for this target data
import stitches
import pandas as pd
import pkg_resources
import xarray as xr
import numpy as np
pd.set_option('display.max_columns', None)
target_data = stitches.fx_processing.get_chunk_info(
stitches.fx_processing.chunk_ts(df = r.target_tgav, n=9))
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:121: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
## extra_info = df[extra_columns].drop_duplicates()
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:163: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## fx_dx_info = fx_dx_info.append(row)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_processing.py:171: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
## extra_info["i"] = 1
print(target_data.head())
## ensemble experiment variable model start_yr end_yr year \
## 0 hectorUI1 RCP Standard-RCP-4.5 tas Hector 1850 1858 1854
## 1 hectorUI1 RCP Standard-RCP-4.5 tas Hector 1859 1867 1863
## 2 hectorUI1 RCP Standard-RCP-4.5 tas Hector 1868 1876 1872
## 3 hectorUI1 RCP Standard-RCP-4.5 tas Hector 1877 1885 1881
## 4 hectorUI1 RCP Standard-RCP-4.5 tas Hector 1886 1894 1890
##
## fx dx
## 0 0.113869 -0.011713
## 1 0.067891 0.011509
## 2 0.141379 0.002686
## 3 0.162569 -0.029602
## 4 -0.045458 0.014252
We want to make sure that when we match to a point in an archive, that point has every netcdfs logged for every variable we care about.
esm = ['CanESM5']
vars1 = ['tas', 'pr', 'hurs', 'psl']
exps = ['ssp126', 'ssp245', 'ssp370', 'ssp460', 'ssp585']
# pangeo table of ESMs for reference
pangeo_path = pkg_resources.resource_filename('stitches', 'data/pangeo_table.csv')
pangeo_data = pd.read_csv(pangeo_path)
# print(np.sort(pangeo_data.variable.unique()))
pangeo_data = pangeo_data[((
(pangeo_data['variable'].isin(vars1)) ) )
& ((pangeo_data['domain'].str.contains('mon')) ) &
((pangeo_data['experiment'].isin(exps))) &
(pangeo_data['model'].isin(esm))].copy()
pangeo_good_ensembles =[]
for name, group in pangeo_data.groupby(['model', 'experiment', 'ensemble']):
df = group.drop_duplicates().copy()
if len(df) >= len(vars1):
pangeo_good_ensembles.append(df)
del(df)
pangeo_good_ensembles1 = pd.concat(pangeo_good_ensembles)
pangeo_good_ensembles2 = pangeo_good_ensembles1[['model', 'experiment', 'ensemble']].drop_duplicates().copy()
pangeo_good_ensembles = pangeo_good_ensembles2.reset_index(drop=True).copy()
del(pangeo_good_ensembles1)
del(pangeo_good_ensembles2)
path = pkg_resources.resource_filename('stitches', 'data/matching_archive.csv')
archive_data = pd.read_csv(path)
# Keep only the entries that appeared in pangeo_good_ensembles:
keys =['model', 'experiment', 'ensemble']
i1 = archive_data.set_index(keys).index
i2 = pangeo_good_ensembles.set_index(keys).index
archive_data= archive_data[i1.isin(i2)].copy()
del(i1)
del(i2)
# # Don't keep archive points with a base year of every year, just the defaults and
# # the midpont
# # do the subset to just core years + 1/2 window offset years.
# years = np.concatenate((1854+9*np.array(list( range(0,28))), 1859+9*np.array(list( range(0,27)))))
# archive_data = archive_data[archive_data['year'].isin(years)].reset_index(drop=True).copy()
print(archive_data.head())
## ensemble variable model experiment start_yr end_yr year \
## 6996 r10i1p1f1 tas CanESM5 ssp126 1850 1858 1854
## 6997 r10i1p1f1 tas CanESM5 ssp126 1859 1867 1863
## 6998 r10i1p1f1 tas CanESM5 ssp126 1868 1876 1872
## 6999 r10i1p1f1 tas CanESM5 ssp126 1877 1885 1881
## 7000 r10i1p1f1 tas CanESM5 ssp126 1886 1894 1890
##
## fx dx
## 6996 -1.360399 0.016472
## 6997 -1.297474 -0.001508
## 6998 -1.200856 0.010901
## 6999 -1.218503 -0.016847
## 7000 -1.285527 0.006000
my_recipes = stitches.make_recipe(target_data, archive_data,
non_tas_variables=['pr', 'psl', 'hurs'],
tol = 0.13, N_matches = 10000)
## You have requested more recipes than possible for at least one target trajectories, returning what can
print(my_recipes.head())
## target_start_yr target_end_yr archive_experiment archive_variable \
## 0 1850 1858 historical tas
## 1 1859 1867 historical tas
## 2 1868 1876 historical tas
## 3 1877 1885 historical tas
## 4 1886 1894 historical tas
##
## archive_model archive_ensemble stitching_id \
## 0 CanESM5 r14i1p1f1 RCP Standard-RCP-4.5~hectorUI1~1
## 1 CanESM5 r24i1p1f1 RCP Standard-RCP-4.5~hectorUI1~1
## 2 CanESM5 r24i1p1f1 RCP Standard-RCP-4.5~hectorUI1~1
## 3 CanESM5 r20i1p1f1 RCP Standard-RCP-4.5~hectorUI1~1
## 4 CanESM5 r24i1p1f1 RCP Standard-RCP-4.5~hectorUI1~1
##
## archive_start_yr archive_end_yr \
## 0 2003 2011
## 1 2003 2011
## 2 2003 2011
## 3 2003 2011
## 4 2003 2011
##
## tas_file \
## 0 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 1 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 2 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 3 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 4 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
##
## hurs_file \
## 0 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 1 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 2 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 3 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 4 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
##
## pr_file \
## 0 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 1 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 2 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 3 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 4 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
##
## psl_file
## 0 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 1 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 2 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 3 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
## 4 gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical...
Unless otherwise specified by the argument
reproducible = True, the draws stitches takes from the pool
of matches is stochastic. Do another draw to show off:
my_recipes2 = stitches.make_recipe(target_data, archive_data,
non_tas_variables=['pr', 'psl', 'hurs'],
tol = 0.13, N_matches = 10000)
## You have requested more recipes than possible for at least one target trajectories, returning what can
Within each of these ensembles, there’s no envelope collapse. If we were to concatenate them together into a super-ensemble, there would be. So it’s not as powerful as a very large ensemble of collapse free realizations but it’s still a distinct ensemble to consider.
stitched_global_temp = stitches.gmat_stitching(my_recipes)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_stitch.py:344: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
## for name, match in rp.groupby(['stitching_id']):
print(stitched_global_temp.head())
## year value variable stitching_id
## 0 1850 0.189172 tas RCP Standard-RCP-4.5~hectorUI1~1
## 1 1851 0.302949 tas RCP Standard-RCP-4.5~hectorUI1~1
## 2 1852 0.170092 tas RCP Standard-RCP-4.5~hectorUI1~1
## 3 1853 0.091319 tas RCP Standard-RCP-4.5~hectorUI1~1
## 4 1854 0.129352 tas RCP Standard-RCP-4.5~hectorUI1~1
stitched_global_temp2 = stitches.gmat_stitching(my_recipes2)
## /Users/snyd535/.virtualenvs/r-reticulate/lib/python3.9/site-packages/stitches/fx_stitch.py:344: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
## for name, match in rp.groupby(['stitching_id']):
if you’re really pressed for computing time, you can pair this with pattern scaling and get the mean spatial field. But we can do internal variability at the gridded multivariate scale, so we do.
Using the above recipes, stitches can produce new gridded netcdf files for each recipe, for multiple variables. Creating multiple netcdfs of monthly data is slower than focusing on GSAT. Still much faster than an ESM, and once the recipes are set, very parallelizable. But for tractability, we are just going to stitch the gridded netcdfs for one ensemble member.
# Just do one realization
recipe = my_recipes[my_recipes['stitching_id'] == 'RCP Standard-RCP-4.5~hectorUI1~1'].reset_index(drop=True).copy()
We must specify a directory for the netcdfs to be saved to. We will use simply do it directly here. On my computer, constructing these 4 netcdfs took ~2-3 minutes.
stitches.gridded_stitching(out_dir='data', rp =recipe)
## ['Stitching gridded netcdf for: CanESM5 tas RCP Standard-RCP-4.5~hectorUI1~1']
## ['data/stitched_CanESM5_tas_RCP Standard-RCP-4.5~hectorUI1~1.nc', 'data/stitched_CanESM5_hurs_RCP Standard-RCP-4.5~hectorUI1~1.nc', 'data/stitched_CanESM5_pr_RCP Standard-RCP-4.5~hectorUI1~1.nc', 'data/stitched_CanESM5_psl_RCP Standard-RCP-4.5~hectorUI1~1.nc']
# pull all netcdfs:
files <- list.files('data', pattern = '.nc', full.names = TRUE)
#subset to just the ESM of interest:
files <- files[grepl(py$esm, files)]
# subset to just the Hector scenario read in above:
files <- files[grepl(hector_exp, files)]
print(files)
## [1] "data/stitched_CanESM5_hurs_RCP Standard-RCP-4.5~hectorUI1~1.nc"
## [2] "data/stitched_CanESM5_pr_RCP Standard-RCP-4.5~hectorUI1~1.nc"
## [3] "data/stitched_CanESM5_psl_RCP Standard-RCP-4.5~hectorUI1~1.nc"
## [4] "data/stitched_CanESM5_tas_RCP Standard-RCP-4.5~hectorUI1~1.nc"
map_time <- '2100-12-31'
eval=FALSElibrary(netcdf4)
# function to convert dates
##TODO you'll have to adjust this depending on your calendar and units in climate data:
# days since 1850-01-01 00:00:00, 365 day year
convert_time <- function(time, reference_year, nc_start_year){
time %>%
dplyr::mutate(time_index = as.integer(row.names(.)) - 1 + 12*(nc_start_year - reference_year),
month = floor(time_index %% 12) + 1,
year = floor(time_index/ 12) + reference_year,
time_id = paste0(month, '~', year )) %>%
dplyr::select(time, time_id)
}
# get the data read in and reshaped
for (v in py$vars1){
nc_name <- files[grepl(v, files)]
print(nc_name)
# open file
ncfile1 <- ncdf4::nc_open(nc_name)
# pull off lon/lat/time info
lat1 <- ncdf4::ncvar_get(ncfile1,"lat")
lon1 <- ncdf4::ncvar_get(ncfile1,"lon")
time1 <- ncdf4::ncvar_get(ncfile1, "time") # units: days since 1850-01-31 00:00:00
# get the var data as a [nlat*nlon, ntime] unlabled matrix
var_data <- t(rbind(matrix(aperm(ncdf4::ncvar_get(ncfile1,v),c(3,1,2)),length(time1),length(lat1)*length(lon1))))
ncdf4:nc_close()
# add time labels
colnames(var_data) <- convert_time(time = data.frame(time=time1),
reference_year = 1850,
nc_start_year = 1850)$time_id
# add lat/lon labels and reshape to long format
grid <- expand.grid(list(lon=lon1,lat=lat1))
assign(paste0(v, 'data'),
cbind(grid, var_data) %>%
tidyr::gather(time, value, -lon, -lat) %>%
tidyr::separate(time, into = c('month', 'year'), sep = '~') %>%
dplyr::mutate(year = as.integer(year),
month=as.integer(month)))
rm(var_data)
rm(time1)
rm(lon1)
rm(lat1)
rm(grid)
}
It’s a lot easier to use python to load in the netcdfs and select a grid cell or a time slice, save those off as simpler data frames than a full netcdf, and plot in R.’
TODO just make it a function
py_files = pd.DataFrame(r.files, columns = ['file_name'])
# load the netcdf:
f = py_files.loc[py_files['file_name'].str.contains('tas')].values[0]
tas_nc = xr.open_dataset(f[0])
# time slice map:
tas_time_slice = tas_nc.sel(time = r.map_time).copy()
# time series for single grid cell:
# lon and lat values for a grid cell near College Park, MD, home of JGCRI:
cp_lat = 38.9897
cp_lon = 180 + 76.9378 # this is probably not actually right
# lat and lon coordinates closest
abslat = np.abs(tas_nc.lat - cp_lat)
abslon = np.abs(tas_nc.lon-cp_lon)
c = np.maximum(abslon, abslat)
([lon_loc], [lat_loc]) = np.where(c == np.min(c))
lon_grid = tas_nc.lon[lon_loc]
lat_grid = tas_nc.lat[lat_loc]
tas_grid_slice = tas_nc.sel(lon = lon_grid, lat=lat_grid,
time = slice('1850-01-01', '2099-12-31')).copy()
del(tas_nc)
Take a look at this data in R:
# map data:
print(py$tas_time_slice$tas)
## <xarray.DataArray 'tas' (lat: 64, lon: 128)>
## array([[254.8431 , 254.7373 , 254.57207, ..., 255.20908, 255.08337, 254.97351],
## [255.91133, 255.52061, 255.24324, ..., 257.03964, 256.59912, 256.26254],
## [255.315 , 254.83546, 254.41234, ..., 256.6023 , 256.1585 , 255.75359],
## ...,
## [271.74088, 272.53 , 273.20392, ..., 267.6897 , 269.28156, 270.6536 ],
## [271.0604 , 271.45453, 271.7761 , ..., 269.08386, 269.78717, 270.53555],
## [270.7581 , 270.85852, 270.92258, ..., 270.49863, 270.57562, 270.64154]],
## dtype=float32)
## Coordinates:
## time datetime64[ns] 2100-12-31
## * lat (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 79.53 82.31 85.1 87.86
## * lon (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
## Attributes:
## units: K
## variable: tas
## experiment: historical
## ensemble: r14i1p1f1
## model: CanESM5
## stitching_id: RCP Standard-RCP-4.5~hectorUI1~1
## recipe_location: data/stitched_CanESM5_tas_RCP Standard-RCP-4.5~hectorU...
tas_val <- py$tas_time_slice$tas$values
tas_lon <- py$tas_time_slice$lon$values
tas_lat <- py$tas_time_slice$lat$values
grid <- expand.grid(list(lon=tas_lon,lat=tas_lat))
tas <- cbind(grid,
value=t(matrix(aperm(tas_val,c(2,1)), 1,length(tas_lat)*length(tas_lon))))
as.character(py$tas_time_slice$time) %>%
substr(., 37, 43) ->
tas_time
ggplot(tas, aes(x = lon, y = lat, color=value, fill = value)) + geom_tile() +
ggtitle(paste(tas_time, 'tas (K)'))
# grid time series data:
print(py$tas_grid_slice$tas)
## <xarray.DataArray 'tas' (time: 3000)>
## array([269.74368, 273.0087 , 274.19226, ..., 291.71487, 278.65762, 273.592 ],
## dtype=float32)
## Coordinates:
## * time (time) datetime64[ns] 1850-01-31 1850-02-28 ... 2099-12-31
## lat float64 37.67
## lon float64 255.9
## Attributes:
## units: K
## variable: tas
## experiment: historical
## ensemble: r14i1p1f1
## model: CanESM5
## stitching_id: RCP Standard-RCP-4.5~hectorUI1~1
## recipe_location: data/stitched_CanESM5_tas_RCP Standard-RCP-4.5~hectorU...
cp_tas <- data.frame(value = py$tas_grid_slice$tas$values)
data.frame(time = as.character(py$tas_grid_slice$time$values)) %>%
separate(time, into = c('year', 'month', 'day'), sep = '-') %>%
select(-day) %>%
mutate(year = as.integer(year),
month = as.integer(month),
time_index = as.integer(row.names(.))) ->
cp_time
ggplot(cbind(cp_time, cp_tas), aes(x = time_index, y = value)) + geom_line() +
ggtitle('College Park monthly tas, 1850-2100 (K)')
ggplot(cbind(cp_time, cp_tas) %>% filter(year >= 2021), aes(x = time_index, y = value)) + geom_line() +
ggtitle('College Park monthly tas, 2021-2100 (K)')
py_files = pd.DataFrame(r.files, columns = ['file_name'])
# load the netcdf:
f = py_files.loc[py_files['file_name'].str.contains('pr')].values[0]
pr_nc = xr.open_dataset(f[0])
# time slice map:
pr_time_slice = pr_nc.sel(time = r.map_time).copy()
# time series for single grid cell:
# lon and lat values for a grid cell near College Park, MD, home of JGCRI:
cp_lat = 38.9897
cp_lon = 180 + 76.9378 # this is probably not actually right
# lat and lon coordinates closest
abslat = np.abs(pr_nc.lat - cp_lat)
abslon = np.abs(pr_nc.lon-cp_lon)
c = np.maximum(abslon, abslat)
([lon_loc], [lat_loc]) = np.where(c == np.min(c))
lon_grid = pr_nc.lon[lon_loc]
lat_grid = pr_nc.lat[lat_loc]
pr_grid_slice = pr_nc.sel(lon = lon_grid, lat=lat_grid,
time = slice('1850-01-01', '2099-12-31')).copy()
del(pr_nc)
Take a look at this data in R:
# map data:
print(py$pr_time_slice$pr)
## <xarray.DataArray 'pr' (lat: 64, lon: 128)>
## array([[3.386200e-06, 3.238463e-06, 3.016512e-06, ..., 3.645885e-06,
## 3.673917e-06, 3.301501e-06],
## [5.245191e-06, 4.769758e-06, 4.283163e-06, ..., 6.483961e-06,
## 5.810207e-06, 5.658150e-06],
## [3.322042e-06, 3.376860e-06, 3.215622e-06, ..., 6.009493e-06,
## 4.892620e-06, 4.685948e-06],
## ...,
## [1.786260e-05, 1.796068e-05, 1.756605e-05, ..., 3.057127e-05,
## 2.430753e-05, 1.891175e-05],
## [1.662058e-05, 1.698199e-05, 1.603339e-05, ..., 2.208773e-05,
## 2.233014e-05, 1.938729e-05],
## [1.454303e-05, 1.400501e-05, 1.383822e-05, ..., 1.702093e-05,
## 1.704664e-05, 1.702684e-05]], dtype=float32)
## Coordinates:
## time datetime64[ns] 2100-12-31
## * lat (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 79.53 82.31 85.1 87.86
## * lon (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
## Attributes:
## units: kg m-2 s-1
## variable: pr
## experiment: historical
## ensemble: r14i1p1f1
## model: CanESM5
## stitching_id: RCP Standard-RCP-4.5~hectorUI1~1
## recipe_location: data/stitched_CanESM5_pr_RCP Standard-RCP-4.5~hectorUI...
pr_val <- py$pr_time_slice$pr$values
pr_lon <- py$pr_time_slice$lon$values
pr_lat <- py$pr_time_slice$lat$values
grid <- expand.grid(list(lon=pr_lon,lat=pr_lat))
pr <- cbind(grid,
value=t(matrix(aperm(pr_val,c(2,1)), 1,length(pr_lat)*length(pr_lon))))
as.character(py$pr_time_slice$time) %>%
substr(., 37, 43) ->
pr_time
ggplot(pr, aes(x = lon, y = lat, color=value, fill = value)) + geom_tile() +
ggtitle(paste(pr_time, 'pr (kg m-2 s-1)'))
# grid time series data:
print(py$pr_grid_slice$pr)
## <xarray.DataArray 'pr' (time: 3000)>
## array([7.720855e-06, 1.992142e-06, 1.111988e-05, ..., 2.953856e-06,
## 8.631851e-06, 4.242951e-06], dtype=float32)
## Coordinates:
## * time (time) datetime64[ns] 1850-01-31 1850-02-28 ... 2099-12-31
## lat float64 37.67
## lon float64 255.9
## Attributes:
## units: kg m-2 s-1
## variable: pr
## experiment: historical
## ensemble: r14i1p1f1
## model: CanESM5
## stitching_id: RCP Standard-RCP-4.5~hectorUI1~1
## recipe_location: data/stitched_CanESM5_pr_RCP Standard-RCP-4.5~hectorUI...
cp_pr <- data.frame(value = py$pr_grid_slice$pr$values)
data.frame(time = as.character(py$pr_grid_slice$time$values)) %>%
separate(time, into = c('year', 'month', 'day'), sep = '-') %>%
select(-day) %>%
mutate(year = as.integer(year),
month = as.integer(month),
time_index = as.integer(row.names(.))) ->
cp_time
ggplot(cbind(cp_time, cp_pr), aes(x = time_index, y = value)) + geom_line() +
ggtitle('College Park monthly pr (kg m-2 s-1), 1850-2100')
ggplot(cbind(cp_time, cp_pr) %>% filter(year >= 2021), aes(x = time_index, y = value)) + geom_line() +
ggtitle('College Park monthly pr (kg m-2 s-1), 2021-2100')
py_files = pd.DataFrame(r.files, columns = ['file_name'])
# load the netcdf:
f = py_files.loc[py_files['file_name'].str.contains('hurs')].values[0]
hurs_nc = xr.open_dataset(f[0])
# time slice map:
hurs_time_slice = hurs_nc.sel(time = r.map_time).copy()
# time series for single grid cell:
# lon and lat values for a grid cell near College Park, MD, home of JGCRI:
cp_lat = 38.9897
cp_lon = 180 + 76.9378 # this is probably not actually right
# lat and lon coordinates closest
abslat = np.abs(hurs_nc.lat - cp_lat)
abslon = np.abs(hurs_nc.lon-cp_lon)
c = np.maximum(abslon, abslat)
([lon_loc], [lat_loc]) = np.where(c == np.min(c))
lon_grid = hurs_nc.lon[lon_loc]
lat_grid = hurs_nc.lat[lat_loc]
hurs_grid_slice = hurs_nc.sel(lon = lon_grid, lat=lat_grid,
time = slice('1850-01-01', '2099-12-31')).copy()
del(hurs_nc)
Take a look at this data in R:
# map data:
print(py$hurs_time_slice$hurs)
## <xarray.DataArray 'hurs' (lat: 64, lon: 128)>
## array([[89.245094, 89.51959 , 89.67603 , ..., 88.926735, 89.13366 , 89.19289 ],
## [90.81301 , 90.860435, 90.839966, ..., 90.29955 , 90.63197 , 90.74781 ],
## [91.859375, 92.097946, 92.269356, ..., 91.17945 , 91.41972 , 91.63859 ],
## ...,
## [95.520386, 94.799515, 94.592995, ..., 96.2515 , 96.32956 , 96.23767 ],
## [95.91351 , 95.84933 , 95.68604 , ..., 96.05101 , 96.09133 , 96.04043 ],
## [96.13935 , 96.15942 , 96.17313 , ..., 96.30332 , 96.311195, 96.29911 ]],
## dtype=float32)
## Coordinates:
## time datetime64[ns] 2100-12-31
## * lat (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 79.53 82.31 85.1 87.86
## * lon (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
## Attributes:
## units: %
## variable: hurs
## experiment: historical
## ensemble: r14i1p1f1
## model: CanESM5
## stitching_id: RCP Standard-RCP-4.5~hectorUI1~1
## recipe_location: data/stitched_CanESM5_hurs_RCP Standard-RCP-4.5~hector...
hurs_val <- py$hurs_time_slice$hurs$values
hurs_lon <- py$hurs_time_slice$lon$values
hurs_lat <- py$hurs_time_slice$lat$values
grid <- expand.grid(list(lon=pr_lon,lat=pr_lat))
hurs <- cbind(grid,
value=t(matrix(aperm(hurs_val,c(2,1)), 1,length(hurs_lat)*length(hurs_lon))))
as.character(py$hurs_time_slice$time) %>%
substr(., 37, 43) ->
hurs_time
ggplot(hurs, aes(x = lon, y = lat, color=value, fill = value)) + geom_tile() +
ggtitle(paste(hurs_time, 'hurs (%)'))
# grid time series data:
print(py$hurs_grid_slice$hurs)
## <xarray.DataArray 'hurs' (time: 3000)>
## array([69.31585 , 58.68924 , 70.73456 , ..., 33.22224 , 49.70032 , 50.694588],
## dtype=float32)
## Coordinates:
## * time (time) datetime64[ns] 1850-01-31 1850-02-28 ... 2099-12-31
## lat float64 37.67
## lon float64 255.9
## Attributes:
## units: %
## variable: hurs
## experiment: historical
## ensemble: r14i1p1f1
## model: CanESM5
## stitching_id: RCP Standard-RCP-4.5~hectorUI1~1
## recipe_location: data/stitched_CanESM5_hurs_RCP Standard-RCP-4.5~hector...
cp_hurs <- data.frame(value = py$hurs_grid_slice$hurs$values)
data.frame(time = as.character(py$hurs_grid_slice$time$values)) %>%
separate(time, into = c('year', 'month', 'day'), sep = '-') %>%
select(-day) %>%
mutate(year = as.integer(year),
month = as.integer(month),
time_index = as.integer(row.names(.))) ->
cp_time
ggplot(cbind(cp_time, cp_hurs), aes(x = time_index, y = value)) + geom_line() +
ggtitle('College Park monthly hurs (%), 1850-2100')
ggplot(cbind(cp_time, cp_hurs) %>% filter(year >= 2021), aes(x = time_index, y = value)) + geom_line() +
ggtitle('College Park monthly hurs (%), 2021-2100')
sea level pressure can’t have a CP time series
py_files = pd.DataFrame(r.files, columns = ['file_name'])
# load the netcdf:
f = py_files.loc[py_files['file_name'].str.contains('psl')].values[0]
psl_nc = xr.open_dataset(f[0])
# time slice map:
psl_time_slice = psl_nc.sel(time = r.map_time).copy()
del(psl_nc)
Take a look at this data in R:
# map data:
print(py$psl_time_slice$psl)
## <xarray.DataArray 'psl' (lat: 64, lon: 128)>
## array([[ 98851.45 , 98860.33 , 98870.234, ..., 98832.06 , 98837.234,
## 98843.73 ],
## [ 98777.375, 98818.11 , 98858.99 , ..., 98667.055, 98700.68 ,
## 98737.84 ],
## [ 98885.29 , 98927.47 , 98962.516, ..., 98738.74 , 98787.6 ,
## 98837.7 ],
## ...,
## [100282.2 , 100281.23 , 100289.21 , ..., 100350.28 , 100316.06 ,
## 100293.5 ],
## [100591.66 , 100602.336, 100616.016, ..., 100580.99 , 100580.75 ,
## 100584.36 ],
## [100882.72 , 100889.836, 100897.44 , ..., 100864.52 , 100870.03 ,
## 100876.1 ]], dtype=float32)
## Coordinates:
## time datetime64[ns] 2100-12-31
## * lat (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 79.53 82.31 85.1 87.86
## * lon (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
## Attributes:
## units: Pa
## variable: psl
## experiment: historical
## ensemble: r14i1p1f1
## model: CanESM5
## stitching_id: RCP Standard-RCP-4.5~hectorUI1~1
## recipe_location: data/stitched_CanESM5_psl_RCP Standard-RCP-4.5~hectorU...
psl_val <- py$psl_time_slice$psl$values
psl_lon <- py$psl_time_slice$lon$values
psl_lat <- py$psl_time_slice$lat$values
grid <- expand.grid(list(lon=psl_lon,lat=psl_lat))
psl <- cbind(grid,
value=t(matrix(aperm(psl_val,c(2,1)), 1,length(psl_lat)*length(psl_lon))))
as.character(py$psl_time_slice$time) %>%
substr(., 37, 43) ->
pr_time
ggplot(psl, aes(x = lon, y = lat, color=value, fill = value)) + geom_tile() +
ggtitle(paste(pr_time, 'psl (Pa)'))